Linkage Disequilibrium-Based Quality Control for Large-Scale Genetic Studies
نویسندگان
چکیده
Quality control (QC) is a critical step in large-scale studies of genetic variation. While, on average, high-throughput single nucleotide polymorphism (SNP) genotyping assays are now very accurate, the errors that remain tend to cluster into a small percentage of "problem" SNPs, which exhibit unusually high error rates. Because most large-scale studies of genetic variation are searching for phenomena that are rare (e.g., SNPs associated with a phenotype), even this small percentage of problem SNPs can cause important practical problems. Here we describe and illustrate how patterns of linkage disequilibrium (LD) can be used to improve QC in large-scale, population-based studies. This approach has the advantage over existing filters (e.g., HWE or call rate) that it can actually reduce genotyping error rates by automatically correcting some genotyping errors. Applying this LD-based QC procedure to data from The International HapMap Project, we identify over 1,500 SNPs that likely have high error rates in the CHB and JPT samples and estimate corrected genotypes. Our method is implemented in the software package fastPHASE, available from the Stephens Lab website (http://stephenslab.uchicago.edu/software.html).
منابع مشابه
The Pattern of Linkage Disequilibrium in Livestock Genome
Linkage disequilibrium (LD) is bases of genomic selection, genomic marker imputation, marker assisted selection (MAS), quantitative trait loci (QTL) mapping, parentage testing and whole genome association studies. The Particular alleles at closed loci have a tendency to be co-inherited. In linked loci this pattern leads to association between alleles in population which is known as LD. Two metr...
متن کاملLinkage disequilibrium: ancient history drives the new genetics.
This brief review provides a summary of the biological causes of genetic association between tightly linked markers--termed linkage disequilibrium--and unlinked markers--termed population structure. We also review the utility of linkage disequilibrium data in gene mapping in isolated populations, in the estimation of recombination rates and in studying the history of particular alleles, includi...
متن کاملAuthor's response to reviews Title: The effects of linkage disequilibrium in large scale SNP datasets for MDR Authors:
Title: The effects of linkage disequilibrium in large scale SNP datasets for MDR
متن کاملFalse Discovery Rate Control for High Dimensional Dependent Data with an Application to Large-scale Genetic Association Studies
Large-scale genetic association studies are increasingly utilized for identifying novel susceptible genetic variants for complex traits, but there is little consensus on analysis methods for such data. Most commonly used methods include single SNP analysis or haplotype analysis with Bonferroni correction for multiple comparisons. Since the SNPs in typical GWAS are often in linkage disequilibriu...
متن کاملEvidence for a large-scale population structure among accessions of Arabidopsis thaliana: possible causes and consequences for the distribution of linkage disequilibrium.
The existence of a large-scale population structure was investigated in Arabidopsis thaliana by studying patterns of polymorphism in a set of 71 European accessions. We used sequence polymorphism surveyed in 10 fragments of approximately 600 nucleotides and a set of nine microsatellite markers. Population structure was investigated using a model-based inference framework. Among the accessions s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PLoS Genetics
دوره 4 شماره
صفحات -
تاریخ انتشار 2008